33 research outputs found
Recommended from our members
Characterizing Audio Events for Video Soundtrack Analysis
There is an entire emerging ecosystem of amateur video recordings on the internet today, in addition to the abundance of more professionally produced content. The ability to automatically scan and evaluate the content of these recordings would be very useful for search and indexing, especially as amateur content tends to be more poorly labeled and tagged than professional content. Although the visual content is often considered to be of primary importance, the audio modality contains rich information which may be very helpful in the context of video search and understanding. Any technology that could help to interpret video soundtrack data would also be applicable in a number of other scenarios, such as mobile device audio awareness, surveillance, and robotics. In this thesis we approach the problem of extracting information from these kinds of unconstrained audio recordings. Specifically we focus on techniques for characterizing discrete audio events within the soundtrack (e.g. a dog bark or door slam), since we expect events to be particularly informative about content. Our task is made more complicated by the extremely variable recording quality and noise present in this type of audio. Initially we explore the idea of using the matching pursuit algorithm to decompose and isolate components of audio events. Using these components we develop an approach for non-exact (approximate) fingerprinting as a way to search audio data for similar recurring events. We demonstrate a proof of concept for this idea. Subsequently we extend the use of matching pursuit to build an actual audio fingerprinting system, with the goal of identifying simultaneously recorded amateur videos (i.e. videos taken in the same place at the same time by different people, which contain overlapping audio). Automatic discovery of these simultaneous recordings is one particularly interesting facet of general video indexing. We evaluate this fingerprinting system on a database of 733 internet videos. Next we return to searching for features to directly characterize soundtrack events. We develop a system to detect transient sounds and represent audio clips as a histogram of the transients it contains. We use this representation for video classification over a database of 1873 internet videos. When we combine these features with a spectral feature baseline system we achieve a relative improvement of 7.5% in mean average precision over the baseline. In another attempt to devise features to better describe and compare events, we investigate decomposing audio using a convolutional form of non-negative matrix factorization, resulting in event-like spectro-temporal patches. We use the resulting representation to build an event detection system that is more robust to additive noise than a comparative baseline system. Lastly we investigate a promising feature representation that has been used by others previously to describe event-like sound effect clips. These features derive from an auditory model and are meant to capture fine time structure in sound events. We compare these features and a related but simpler feature set on the task of video classification over 9317 internet videos. We find that combinations of these features with baseline spectral features produce a significant improvement in mean average precision over the baseline
Recommended from our members
Audio Fingerprinting to Identify Multiple Videos of an Event
The proliferation of consumer recording devices and video sharing websites makes the possibility of having access to multiple recordings of the same occurrence increasingly likely. These co-synchronous recordings can be identified via their audio tracks, despite local noise and channel variations. We explore a robust fingerprinting strategy to do this. Matching pursuit is used to obtain a sparse set of the most prominent elements in a video soundtrack. Pairs of these elements are hashed and stored, to be efficiently compared with one another. This fingerprinting is tested on a corpus of over 700 YouTube videos related to the 2009 U.S. presidential inauguration. Reliable matching of identical events in different recordings is demonstrated, even under difficult conditions
Audio Fingerprinting to Identify Multiple Videos of an Event
The proliferation of consumer recording devices and video sharing websites makes the possibility of having access to multiple recordings of the same occurrence increasingly likely. These co-synchronous recordings can be identified via their audio tracks, despite local noise and channel variations. We explore a robust fingerprinting strategy to do this. Matching pursuit is used to obtain a sparse set of the most prominent elements in a video soundtrack. Pairs of these elements are hashed and stored, to be efficiently compared with one another. This fingerprinting is tested on a corpus of over 700 YouTube videos related to the 2009 U.S. presidential inauguration. Reliable matching of identical events in different recordings is demonstrated, even under difficult conditions
Recommended from our members
Spectral vs. spectro-temporal features for acoustic event detection
Automatic detection of different types of acoustic events is an interesting problem in soundtrack processing. Typical approaches to the problem use short-term spectral features to describe the audio signal, with additional modeling on top to take temporal context into account. We propose an approach to detecting and modeling acoustic events that directly describes temporal context, using convolutive non-negative matrix factorization (NMF). NMF is useful for finding parts-based decompositions of data; here it is used to discover a set of spectro-temporal patch bases that best describe the data, with the patches corresponding to event-like structures. We derive features from the activations of these patch bases, and perform event detection on a database consisting of 16 classes of meeting-room acoustic events. We compare our approach with a baseline using standard short-term mel frequency cepstal coefficient (MFCC) features. We demonstrate that the event-based system is more robust in the presence of added noise than the MFCC-based system, and that a combination of the two systems performs even better than either individually
Recommended from our members
Joint Audio-Visual Signatures for Web Video Analysis
Presentation of video classification project
Recommended from our members
Joint Audio-Visual Signatures for Web Video Analysis
Presentation of video classification project, including the TRECVID MED2010 system
Effects of a high-dose 24-h infusion of tranexamic acid on death and thromboembolic events in patients with acute gastrointestinal bleeding (HALT-IT): an international randomised, double-blind, placebo-controlled trial
Background: Tranexamic acid reduces surgical bleeding and reduces death due to bleeding in patients with trauma.
Meta-analyses of small trials show that tranexamic acid might decrease deaths from gastrointestinal bleeding. We
aimed to assess the effects of tranexamic acid in patients with gastrointestinal bleeding.
Methods: We did an international, multicentre, randomised, placebo-controlled trial in 164 hospitals in 15 countries.
Patients were enrolled if the responsible clinician was uncertain whether to use tranexamic acid, were aged above the
minimum age considered an adult in their country (either aged 16 years and older or aged 18 years and older), and
had significant (defined as at risk of bleeding to death) upper or lower gastrointestinal bleeding. Patients were
randomly assigned by selection of a numbered treatment pack from a box containing eight packs that were identical
apart from the pack number. Patients received either a loading dose of 1 g tranexamic acid, which was added to
100 mL infusion bag of 0·9% sodium chloride and infused by slow intravenous injection over 10 min, followed by a
maintenance dose of 3 g tranexamic acid added to 1 L of any isotonic intravenous solution and infused at 125 mg/h
for 24 h, or placebo (sodium chloride 0·9%). Patients, caregivers, and those assessing outcomes were masked to
allocation. The primary outcome was death due to bleeding within 5 days of randomisation; analysis excluded patients
who received neither dose of the allocated treatment and those for whom outcome data on death were unavailable.
This trial was registered with Current Controlled Trials, ISRCTN11225767, and ClinicalTrials.gov, NCT01658124.
Findings: Between July 4, 2013, and June 21, 2019, we randomly allocated 12 009 patients to receive tranexamic acid
(5994, 49·9%) or matching placebo (6015, 50·1%), of whom 11 952 (99·5%) received the first dose of the allocated
treatment. Death due to bleeding within 5 days of randomisation occurred in 222 (4%) of 5956 patients in the
tranexamic acid group and in 226 (4%) of 5981 patients in the placebo group (risk ratio [RR] 0·99, 95% CI 0·82â1·18).
Arterial thromboembolic events (myocardial infarction or stroke) were similar in the tranexamic acid group and
placebo group (42 [0·7%] of 5952 vs 46 [0·8%] of 5977; 0·92; 0·60 to 1·39). Venous thromboembolic events (deep vein
thrombosis or pulmonary embolism) were higher in tranexamic acid group than in the placebo group (48 [0·8%] of
5952 vs 26 [0·4%] of 5977; RR 1·85; 95% CI 1·15 to 2·98).
Interpretation: We found that tranexamic acid did not reduce death from gastrointestinal bleeding. On the basis of our
results, tranexamic acid should not be used for the treatment of gastrointestinal bleeding outside the context of a
randomised trial
FINDING SIMILAR ACOUSTIC EVENTS USING MATCHING PURSUIT AND LOCALITY-SENSITIVE HASHING
There are many applications for the ability to find repetitions of perceptually similar sound events in generic audio recordings. We explore the use of matching pursuit (MP) derived features to identify repeated patterns that characterize distinct acoustic events. We use locality-sensitive hashing (LSH) to efficiently search for similar items. We describe a method for detecting repetitions of events, and demonstrate performance on real data. Index Terms â Acoustic signal analysis, database searching 1